## IMPORTANT: On Colab, we expect your homework to be in the cs189 folder
## Please contact staff if you encounter any problems with installing dependencies
import sys, os
IS_COLAB = 'google.colab' in sys.modules
if IS_COLAB:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/cs189/hw/hw1
%pip install -r ./requirements.txt
!pip install -U kaleido plotly
import kaleido
kaleido.get_chrome_sync()
import plotly.io as pio
# Import kaleido to ensure it's available and properly initialized
try:
import kaleido
# Initialize kaleido (this ensures it's ready to use)
# Set renderer to use PNG output with kaleido
pio.renderers.default = "plotly_mimetype+notebook+png"
print("✓ Kaleido is available, using PNG renderer")
except ImportError:
# Fallback to HTML if kaleido is not available
pio.renderers.default = "plotly_mimetype+notebook"
print("⚠ Kaleido not found, using HTML renderer")
except Exception as e:
# If there's any other error, use HTML renderer
pio.renderers.default = "plotly_mimetype+notebook"
print(f"⚠ Error initializing kaleido: {e}, using HTML renderer")
✓ Kaleido is available, using PNG renderer
# Initialize Otter
import otter
grader = otter.Notebook("fashion_pt_1.ipynb")
Homework 1.1 – AGI, Everywhere, All at Once
Welcome to Homework 1.1! In this assignment, you will get familiar with common data and visualization tools like numpy, pandas, and plotly. This notebook emphasizes pandas operations throughout, and you will work with DataFrames as your primary data structure.
Due Date: Friday, September 19, 11:59 PM¶
This assignment is due on Friday, September 19, at 11:59 PM. You must submit your work to Gradescope by this deadline. Please refer to the syllabus for the Slip Day policy. No late submissions will be accepted beyond the details outlined in the Slip Day policy.
Submission Tips:¶
- Plan ahead: We strongly encourage you to submit your work several hours before the deadline. This will give you ample time to address any submission issues.
- Reach out for help early: If you encounter difficulties, contact course staff well before the deadline. While we are happy to assist with submission issues, we cannot guarantee responses to last-minute requests.
Assignment Overview¶
This notebook contains a series of tasks designed to help you practice and apply key concepts in data manipulation and visualization. You will complete all the TODOs in the notebook, which include both coding and written response questions. Some tasks are open-ended, which allows you to explore and experiment with different approaches.
Key Learning Objectives:¶
- Work with
numpyandpandasfor data manipulation. - Visualize data using
plotlyandpandas' built-in plotting functions. - Gain experience with organizing and analyzing datasets.
- Understand the importance of data exploration and preprocessing.
Grading Breakdown¶
| Question | Manual Grading? | Points |
|---|---|---|
| 0a | No | 1 |
| 1a | No | 1 |
| 1b | No | 1 |
| 1c | Yes | 1 |
| 1d | No | 1 |
| 2a | No | 2 |
| 2b | No | 1 |
| 2c | Yes | 1 |
| 2d | Yes | 2 |
| 3a | No | 2 |
| 3b | No | 2 |
| 3c | No | 1 |
| 3d | Yes | 2 |
| 3e | No | 2 |
| 3f | No | 1 |
| 3g | No | 1 |
| 3h | Yes | 1 |
| 3i | No | 1 |
| 3j | Yes | 1 |
| 4a | No | 1 |
| 4b | No | 2 |
| 4c | No | 2 |
| 4d | No | 2 |
| 4e | No | 2 |
| 4f | No | 1 |
| 4g | Yes | 2 |
| 4h | No | 1 |
| 4i | No | 2 |
| 4j | Yes | 2 |
| Total | 42 |
Note: "Manual" questions are written response questions that will be graded manually by the course staff. All other questions will be graded automatically by the autograder.
Instructions:¶
- Carefully read each question and its requirements.
- Complete all TODOs in the notebook. You may add extra lines of code if needed to implement your solution.
- For manual questions, provide clear and concise written responses.
- Test your code thoroughly to ensure it meets the requirements.
Good luck!
import numpy as np
import pandas as pd
import plotly.express as px
import torchvision
import os
import random
from IPython.display import display
IMPORTANT:¶
- Do not change the random seed values!!!
- Before you submit your notebook, remember to set
save_models=Trueandload_models=True. This saves your final models which we will use for the autograder. Set these to false if you are still tweaking your model setup. We have provided code for saving models - do not change these file names!! - When uploading your notebook, make sure to include your model file
classifier.joblibin your submission
# Set random seeds for reproducible results
SEED = 189
np.random.seed(SEED)
random.seed(SEED)
# IMPORTANT: set save_models to True to save trained models. YOU NEED TO DO THIS FOR THE AUTOGRADER TO WORK.
import joblib
save_models = True
load_saved_models = True # After training, you can set this to True to load the saved models and not have to re-train them.
Setup¶
Load the Fashion-MNIST dataset¶
In this homework, we will work with the Fashion-MNIST dataset, a widely used benchmark dataset for machine learning. It consists of grayscale 28x28 pixel images of various articles of clothing, making it an excellent dataset for practicing image classification.
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Han Xiao, Kashif Rasul, Roland Vollgraf. https://github.com/zalandoresearch/fashion-mnist
This dataset serves as an alternative to the classic MNIST digits dataset, which contains images of handwritten digits. Fashion-MNIST is more challenging and better reflects real-world image classification tasks.
We will load the dataset using torchvision, a PyTorch library that provides popular datasets, models, and transformation tools. While you don't need to fully understand PyTorch for this homework, it's helpful to know that the dataset contains two key components:
data: the images themselves, represented as 28x28 grayscale arrays.targets: the class labels for each image, where each label corresponds to a specific article of clothing.
The dataset includes 10 classes, each representing a type of clothing item:
T-shirt/topTrouserPulloverDressCoatSandalShirtSneakerBagAnkle boot
We will explore this dataset in detail and use it to practice data manipulation, visualization, and machine learning techniques.
# Load the FashionMNIST dataset from torchvision
train_data = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True)
# Extract the image data and convert it to a numpy array of type float
images = train_data.data.numpy().astype(float)
# Extract the target labels as a numpy array
targets = train_data.targets.numpy()
# Create a dictionary mapping class indices to class names
class_dict = {i: class_name for i, class_name in enumerate(train_data.classes)}
# Map the target labels to their corresponding class names
labels = np.array([class_dict[t] for t in targets])
# Create a list of class names in order of their indices
class_names = [class_dict[i] for i in range(len(class_dict))]
# Get the total number of samples in the dataset
n = len(images)
# Ensure class_names is a list of class names (redundant but ensures consistency)
class_names = list(class_dict.values())
# Print dataset information for verification
print("Loaded FashionMNIST dataset with {} samples.".format(n))
print("Classes: {}".format(class_dict))
print("Image shape: {}".format(images[0].shape)) # Shape of a single image
print("Image dtype: {}".format(images[0].dtype)) # Data type of the image array
print("Image type: {}".format(type(images[0]))) # Type of the image object
Loaded FashionMNIST dataset with 60000 samples.
Classes: {0: 'T-shirt/top', 1: 'Trouser', 2: 'Pullover', 3: 'Dress', 4: 'Coat', 5: 'Sandal', 6: 'Shirt', 7: 'Sneaker', 8: 'Bag', 9: 'Ankle boot'}
Image shape: (28, 28)
Image dtype: float64
Image type: <class 'numpy.ndarray'>
Now let's create a DataFrame to organize our data
In this class, we will be using a lot of pandas, which is a powerful library for data analysis and manipulation. A DataFrame in pandas is essentially a table where we can store and perform operations on our data.
Why use a DataFrame?¶
A DataFrame allows us to:
- Organize data into rows and columns for better readability.
- Perform efficient operations on the data, such as filtering, grouping, and aggregating.
- Integrate seamlessly with other libraries for visualization and machine learning.
Problem 0a¶
Task: Create a DataFrame called df with two columns: image and label. Each row should correspond to an image and its associated label. You can preview the first 5 rows of a DataFrame by calling df.head().
Hints:
- What is the current object type of the variable
images? Note thatpandasexpects 1D or 2D data for each value in aDataFramecolumn. You may need to first convertimagesto a Python list before using it to create theDataFrame. - Later on, when we use our
DataFramefor training, it's best if the values in theimagecolumn arendarrayobjects. After creating theDataFrame, consider re-casting all the values in theimagecolumn tondarrayfor consistency.
# TODO: Create a DataFrame with two columns: `image` and `label`
# Convert images to a list (pandas expects 1D or 2D data for each value)
images_list = [img for img in images]
# Create DataFrame with image and label columns
df = pd.DataFrame({
'image': images_list,
'label': labels
})
# Re-cast image column values to numpy arrays for consistency
df['image'] = df['image'].apply(lambda x: np.array(x))
# Print the shape and columns of the DataFrame
print("DataFrame shape:", df.shape)
print("DataFrame columns:", df.columns.tolist())
df.head()
DataFrame shape: (60000, 2) DataFrame columns: ['image', 'label']
| image | label | |
|---|---|---|
| 0 | [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... | Ankle boot |
| 1 | [[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,... | T-shirt/top |
| 2 | [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... | T-shirt/top |
| 3 | [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.0... | Dress |
| 4 | [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... | T-shirt/top |
grader.check("q0")
q0
passed! 💯
Problem 1: Introduction to pandas and Plotly¶
Now that we have created our DataFrame, let's start analyzing our data. A key aspect of machine learning is understanding the data you are working with, so let's create some visualizations of our dataset.
One of the first steps in data analysis is to check how "balanced" the dataset is. This means examining the distribution of the labels to see if each class appears equally in the dataset. A balanced dataset ensures that no class is overrepresented or underrepresented, which can impact the performance of machine learning models.
Problem 1a: Checking Dataset Balance¶
Task: Calculate the distribution of the label column in the df DataFrame using value_counts() and store it in a variable called label_distribution. Then, determine whether or not our dataset is balanced by comparing the minimum and maximum values of label_distribution. Store the result as a boolean value in the is_balanced variable.
# TODO: Calculate the distribution of labels using `value_counts()``
# TODO: Compare the min and max values of `label_distribution` to determine if the dataset is balanced.
# Calculate label distribution using value_counts()
label_distribution = df['label'].value_counts()
# Determine if dataset is balanced by comparing min and max values
# Dataset is balanced if min == max (all classes have the same count)
is_balanced = label_distribution.min() == label_distribution.max()
print(f"Label distribution:\n{label_distribution}")
print(f"Is the dataset balanced? {is_balanced}")
Label distribution: label Ankle boot 6000 T-shirt/top 6000 Dress 6000 Pullover 6000 Sneaker 6000 Sandal 6000 Trouser 6000 Shirt 6000 Coat 6000 Bag 6000 Name: count, dtype: int64 Is the dataset balanced? True
grader.check("q1a")
q1a
passed! 🚀
Problem 1b: Grouping Data with groupby()¶
The groupby() function in pandas is a powerful tool for grouping rows based on column values and applying aggregation functions like .size().
Task:
Group df by the label column and count the rows in each group using .size().
Example Output:¶
| label | count |
|---|---|
| Ankle boot | 6000 |
| Bag | 6000 |
| ... | ... |
# TODO: Group the rows in `df` according to the values in the `labels` column. Then, count the number of rows in each group.
label_distribution_groupby = df.groupby('label').size()
grader.check("q1b")
q1b
passed! 🌈
Problem 1c: Visualizing Label Distribution¶
One of the strengths of pandas is its ability to quickly generate visualizations of data. This is particularly useful for understanding the distribution of your dataset. In this task, we will use pandas' built-in plotting functions to create a visualization of the label distribution in our DataFrame.
Why Visualize Label Distribution?¶
Visualizing the label distribution helps us:
- Understand the balance of classes in the dataset.
- Identify any potential biases or imbalances that could affect model performance.
- Gain insights into the dataset before proceeding with further analysis.
Task:
- Use the
pandasbuilt-in plotting functions to create a histogram of the label distribution. (x-axis is the class label and y axis is the sample count) - Ensure the chart is clear and labeled appropriately for easy interpretation.
# Plotting library to use, default is matplotlib but plotly has more functionality
pd.options.plotting.backend = "plotly"
# TODO: Plot a histogram of the labels in the DataFrame `df` using the DataFrame's built-in plotting functions (this should be 1 line)
fig = df['label'].value_counts().plot(kind='bar', title='Label Distribution')
fig.update_layout(xaxis_title='Class Label', yaxis_title='Sample Count')
# Display the figure (will use HTML rendering in notebook)
fig
As a quick refresher, here is the show_images function from lecture. This function visualizes our images and labels each of them with what class they are from.
def show_images(images, max_images=40, ncols=5, labels = None, reshape=False):
"""Visualize a subset of images from the dataset.
Args:
images (np.ndarray or list): Array of images to visualize [img,row,col].
max_images (int): Maximum number of images to display.
ncols (int): Number of columns in the grid.
labels (np.ndarray, optional): Labels for the images, used for facet titles.
Returns:
plotly.graph_objects.Figure: A Plotly figure object containing the images.
"""
if isinstance(images, list):
images = np.stack(images)
n = min(images.shape[0], max_images) # Number of images to show
px_height = 220 # Height of each image in pixels
if reshape:
images = images.reshape(images.shape[0], 28, 28)
fig = px.imshow(images[:n, :, :], color_continuous_scale='gray_r',
facet_col = 0, facet_col_wrap=ncols,
height = px_height * int(np.ceil(n/ncols)))
fig.update_layout(coloraxis_showscale=False)
fig.update_xaxes(showticklabels=False, showgrid=False)
fig.update_yaxes(showticklabels=False, showgrid=False)
if labels is not None:
# Extract the facet number and replace with the label.
fig.for_each_annotation(lambda a: a.update(text=labels[int(a.text.split("=")[-1])]))
return fig
Problem 1d: Visualizing Class Examples¶
To better understand the dataset, let's visualize a few examples from each class. This will help us see what the images look like and how they differ across classes.
Task:
- Use the
pandasgroupbyfunction to group theDataFrameby thelabelcolumn. - Sample 2 images per class.
- Use the
show_imagesfunction to display the images in a grid, with each image labeled by its class name.
# TODO: Get 2 sample images per class and plot them.
# Group by label and sample 2 images from each class
examples = df.groupby('label').apply(lambda x: x.sample(n=2, random_state=SEED)).reset_index(drop=True)
fig = show_images(examples["image"].tolist(), ncols=4, labels=examples["label"].tolist())
fig.show()
/var/folders/h_/w713ftk92p92c30xqw97tf4h0000gn/T/ipykernel_96395/3279938773.py:3: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
grader.check("q1d")
q1d
passed! 🎉
Problem 2: Understanding Data Structure with Clustering¶
Before training classifiers, we explore the data structure using k-means clustering, an unsupervised learning method. This helps identify patterns and relationships in the dataset.
Why Clustering?
- Discover Similarities: Group similar clothing items based on pixel values.
- Data Insights: Understand dataset structure to guide modeling.
- Simplify Data: Potential preprocessing or dimensionality reduction.
Steps:
- Flatten images for clustering (done below).
- Apply k-means to group images.
- Analyze clusters for patterns.
Before we can apply clustering algorithms or train models, we need to preprocess our images. Most machine learning algorithms expect input data to be in a 1-dimensional format. Currently, our images are in a 2D format with dimensions (28, 28).
Thus, let's first reshape each image from (28, 28) to a 1-dimensional array of size (784,) using the Pandas the apply() function
# Flatten each image from (28, 28) to (784,)
# Ensure each image is converted to numpy array and flattened to 1D
def flatten_image(img):
img_array = np.array(img)
# If image is 2D (28, 28), reshape to 1D (784,)
# If already 1D, ensure it's the right shape
if img_array.ndim == 2:
return img_array.reshape(-1)
elif img_array.ndim == 1:
# Already 1D, but ensure it's exactly 784 elements
if img_array.size == 784:
return img_array
else:
return img_array.reshape(-1)
else:
return img_array.flatten()
df["image"] = df["image"].apply(flatten_image)
# Verify all images are flattened to shape (784,)
assert df['image'].apply(lambda img: img.shape == (784,)).all(), 'Not all images are flattened to shape (784,)'
np.stack(df['image'].values).shape
(60000, 784)
Problem 2a: K-means Clustering on the Pixels¶
Use K-means clustering to group similar images based on their pixel values. This will help us understand how well the algorithm can identify patterns in the dataset without using the labels.
Task:
- Use the sklearn's
KMeansclass to cluster the images into10clusters (since there are 10 classes in the dataset). For efficiency we will only cluster a 1000 image sample (df_sample). - Create a
DataFramecalledkmeans_dfwith the following columns:image: the image data (flattened to 1D arrays of size 784).label: the true class label of the image.cluster: the cluster label assigned by K-means.
Instructions:
- When clustering, set
random_state=SEEDfor reproducibility.
Expected Output:
The kmeans_df DataFrame should look like this:
| cluster | label | image |
|---|---|---|
| 7 | Ankle boot | [0.0, 0.0, 0.0, 0.0, 0.0, ...] |
| 6 | T-shirt/top | [0.0, 0.0, 0.0, 0.0, 1.0, ...] |
# TODO: Perform k-means clustering on the images (10 clusters to match the number of classes)
from sklearn.cluster import KMeans
df_sample = df.sample(n=1000, random_state=SEED)
# Flatten images to 1D arrays for clustering
# Ensure images are flattened: if image is 2D (28x28), reshape to 1D (784,)
X_sample = np.stack([img.flatten() if img.ndim > 1 else img for img in df_sample['image'].values])
# Verify shape: should be (1000, 784)
print(f"X_sample shape: {X_sample.shape}")
# Perform K-means clustering with 10 clusters
kmeans = KMeans(n_clusters=10, random_state=SEED)
cluster_labels = kmeans.fit_predict(X_sample)
# Create DataFrame with cluster assignments
kmeans_df = pd.DataFrame({
'image': df_sample['image'].values,
'label': df_sample['label'].values,
'cluster': cluster_labels
})
kmeans_df.head(3)
X_sample shape: (1000, 784)
| image | label | cluster | |
|---|---|---|---|
| 0 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | Bag | 1 |
| 1 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 54.0,... | T-shirt/top | 1 |
| 2 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | Trouser | 2 |
grader.check("q2a")
q2a
passed! 💯
Problem 2b: Evaluating K-means Clustering¶
K-means clustering groups data points into clusters based on their similarity. To evaluate how well the clustering algorithm has separated the classes, we can analyze the distribution of true labels within each cluster.
Task:
- Use the
kmeans_dfDataFrameto calculate the distribution of true labels (label) within each cluster (cluster). - Create a stacked bar plot to visualize the label counts per cluster. Each bar should represent a cluster, and the segments of the bar should represent the counts of each label within that cluster.
Hint: If you are running into issues where there are bars “hidden” behind other ones in your Plotly bar chart, try making sure you use fillna(0) or unstack(fill_value=0) after grouping by your KMean clusters.
# TODO: Create a stacked bar plot of the label counts per cluster.
# Group by cluster and label, then count occurrences
cluster_label_counts = kmeans_df.groupby(['cluster', 'label']).size().unstack(fill_value=0)
# Create bar plot and set barmode to 'stack' for stacked bars
fig = cluster_label_counts.plot(
kind='bar',
title='Distribution of True Labels in Each K-means Cluster'
)
# Update layout to stack bars
fig.update_layout(barmode='stack')
Problem 2c: Visualizing Clusters¶
To better understand the clusters formed by the K-means algorithm, we will visualize a few sample images from each cluster. This will help us identify patterns or similarities among images within the same cluster.
Task:
- For each cluster, randomly sample 7 images.
- Use the
show_imagesfunction to display the sampled images in a grid. - Observe the visual similarities among images in the same cluster.
# TODO: Plot 7 images from each cluster (use the show_images function, 10 rows, 7 columns)
# Sample 7 images from each cluster
cluster_samples = kmeans_df.groupby('cluster').apply(lambda x: x.sample(n=min(7, len(x)), random_state=SEED)).reset_index(drop=True)
# Reshape images for visualization (from 784 to 28x28)
cluster_images = np.stack([img.reshape(28, 28) for img in cluster_samples['image'].values])
# Create labels showing cluster and true label
cluster_labels_display = [f"Cluster {c}, True: {l}" for c, l in zip(cluster_samples['cluster'], cluster_samples['label'])]
# Display images
show_images(cluster_images, max_images=70, ncols=7, labels=cluster_labels_display, reshape=False)
/var/folders/h_/w713ftk92p92c30xqw97tf4h0000gn/T/ipykernel_96395/1404412372.py:3: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
Problem 2d: Observing Patterns in K-means Clustering¶
Reflecting on the visualizations from the previous part, we observe that the k-means clustering algorithm groups images not only by their clothing category (class) but also by other shared characteristics.
Question: Besides the clothing category, what other visual or structural characteristics of the images might the k-means clustering algorithm be grouping together?
Type your answer here, replacing this text.
Problem 3: Training a Classifier¶
In this section, we will train a machine learning classifier to predict clothing categories from image pixel data. Specifically, we will use a Multi-Layer Perceptron (MLP) classifier, which is a type of neural network.
Workflow Overview¶
We will follow a structured workflow:
- Data Preparation: Split the dataset into training and testing sets while maintaining class balance.
- Model Training: Train the MLP classifier on the training set.
- Model Evaluation: Evaluate the classifier's performance on the test set using metrics like accuracy.
- Visualization: Visualize predictions and analyze misclassifications to understand model behavior.
This workflow mirrors the process used in the lecture notebook, but you will implement some of the functions yourself to deepen your understanding.
Creating Train/Test Split As mentioned in lecture, first we will split our dataset into training and testing sets. This is a crucial step in machine learning to evaluate how well a model generalizes to unseen data.
Unlike the lecture, where we used sklearn's train_test_split function, we have split our dataset using pandas functions.
Do not change this function! Otherwise the autograder will likely fail.
df_copy = df.copy()
train_df = df_copy.groupby('label').sample(frac=0.8, random_state=SEED)
test_df = df_copy[~df_copy.index.isin(train_df.index)]
print(f"Training set size: {len(train_df)}")
print(f"Test set size: {len(test_df)}")
Training set size: 48000 Test set size: 12000
Problem 3a: Train MLP Classifier¶
In this task, we will train a Multi-Layer Perceptron (MLP) classifier to predict clothing categories from image data. The MLP is a type of neural network that is well-suited for classification tasks. The demo notebook from lecture 3 could be particularly useful.
Steps to Follow:
Data Normalization:
- Scale the pixel values of the images to the range
[0, 1]for better training performance. - Create new variables
X_train_scandX_test_scfor the scaled training and testing data, respectively. Do not overwrite the originalX_trainandX_test.
- Scale the pixel values of the images to the range
Model Training:
- Use the same MLP configuration (size, hyperparameters) as demonstrated in the lecture 3 notebook.
- Train the model on the normalized training data.
Loss Curve:
- Extract the loss curve from the trained model using the
model.loss_curve_attribute. - Create a
DataFramecalledloss_dfwith two columns:epochandloss. - Use Plotly Express to plot the loss curve, showing how the loss decreases as the number of epochs increases.
- Extract the loss curve from the trained model using the
Notes:
- The term "loss" refers to the error (textbook terminology) during training. Minimizing the loss is the goal of the training process.
- Ensure that the model is trained with reproducibility in mind (e.g., set a random seed to
SEEDwhere applicable).
# Importing necessary modules for training and preprocessing
from sklearn.neural_network import MLPClassifier # Multi-Layer Perceptron Classifier for training
from sklearn.preprocessing import StandardScaler # StandardScaler for normalizing the data
# flatten features into 1D arrays
X_train = np.stack(train_df['image'].values)
y_train = train_df['label'].values
X_test = np.stack(test_df['image'].values)
y_test = test_df['label'].values
print(f"X_train shape: {X_train.shape}\t y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}\t y_test shape: {y_test.shape}")
# TODO: Train the model using the scaled traning data and plott the loss curve (remeber to normalize your data!)
# NOTE: Your model must be named `model`
if load_saved_models and os.path.exists('classifier.joblib'):
model = joblib.load('classifier.joblib')
# Still need to scale the data for predictions
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
# Create loss_df from saved model
loss_df = pd.DataFrame({
'epoch': range(1, len(model.loss_curve_) + 1),
'loss': model.loss_curve_
})
else:
# Step 1: Normalize the data (scale pixel values to [0, 1])
# Since pixel values are already in [0, 255], divide by 255 to get [0, 1]
X_train_sc = X_train / 255.0
X_test_sc = X_test / 255.0
# Step 2: Train MLP Classifier
# Using similar configuration as lecture 3 notebook
model = MLPClassifier(
hidden_layer_sizes=(100, 50), # Two hidden layers with 100 and 50 neurons
max_iter=100, # Maximum number of iterations
random_state=SEED, # For reproducibility
verbose=False
)
# Train the model
model.fit(X_train_sc, y_train)
# Step 3: Create loss curve DataFrame
loss_df = pd.DataFrame({
'epoch': range(1, len(model.loss_curve_) + 1),
'loss': model.loss_curve_
})
if save_models:
joblib.dump(model, 'classifier.joblib')
loss_df.plot(x='epoch', y='loss', title="Training Error")
X_train shape: (48000, 784) y_train shape: (48000,) X_test shape: (12000, 784) y_test shape: (12000,)
/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/site-packages/sklearn/neural_network/_multilayer_perceptron.py:788: UserWarning: Training interrupted by user.
grader.check("q3a")
q3a
passed! 🚀
Problem 3b: Adding Predictions and Evaluation Metrics to DataFrames¶
Task: Modify both train_df and test_df by adding the following columns and compute train and test accuracy:
predicted_label: The predicted label for each image, as determined by the trained model.correct: A boolean value indicating whether the predicted label matches the true label (Truefor correct predictions,Falseotherwise).probs: The class probabilities for each image, represented as a list of size 10 (one probability per class).confidence: The probability associated with the predicted label, representing the model's confidence in its prediction.
# TODO: Add the columns listed above to `train_df` and `test_df`.
train_df = train_df.copy()
test_df = test_df.copy()
# Get predictions for training and test sets
train_predicted_labels = model.predict(X_train_sc)
test_predicted_labels = model.predict(X_test_sc)
# Get class probabilities for training and test sets
train_probs = model.predict_proba(X_train_sc)
test_probs = model.predict_proba(X_test_sc)
# Add predicted_label column
train_df['predicted_label'] = train_predicted_labels
test_df['predicted_label'] = test_predicted_labels
# Add correct column (boolean indicating if prediction matches true label)
train_df['correct'] = train_df['predicted_label'] == train_df['label']
test_df['correct'] = test_df['predicted_label'] == test_df['label']
# Add probs column (list of probabilities for each class)
train_df['probs'] = train_probs.tolist()
test_df['probs'] = test_probs.tolist()
# Add confidence column (probability of the predicted label)
train_df['confidence'] = [probs[np.argmax(probs)] for probs in train_probs]
test_df['confidence'] = [probs[np.argmax(probs)] for probs in test_probs]
print("--- Column Types ----")
for col in train_df.columns:
val = train_df[col].iloc[0]
print(f"{col}: {type(val)}")
print("-----------")
# Calculate accuracy
train_accuracy = train_df['correct'].mean()
test_accuracy = test_df['correct'].mean()
print(f"Training accuracy: {train_accuracy:.3f}")
print(f"Test accuracy: {test_accuracy:.3f}")
--- Column Types ---- image: <class 'numpy.ndarray'> label: <class 'str'> predicted_label: <class 'str'> correct: <class 'numpy.bool'> probs: <class 'list'> confidence: <class 'numpy.float64'> ----------- Training accuracy: 0.975 Test accuracy: 0.884
grader.check("q3b")
q3b
passed! 🌈
Problem 3c: Class Accuracy Analysis and Visualization¶
Analyze the model's performance for each class and visualize the class-wise accuracy for both the training and testing datasets.
Task 1: Create a class_accuracy DataFrame¶
- Group the
train_dfandtest_dfDataFrames bylabel(class). - Calculate the accuracy for each class as the proportion of correct predictions (
correctcolumn). - Add a
splitcolumn to indicate whether the data is from the training or testing set. - Combine the results into a single DataFrame called
class_accuracywith the following columns:split: Indicates whether the data is from the training or testing set.label: The class label.correct: The accuracy for the class.
Task 2: Visualize Class Accuracy¶
- Use the
class_accuracyDataFrame to create a grouped bar chart. - The x-axis should represent the class labels (
label), and the y-axis should represent the accuracy (correct). - Use different colors for the training and testing splits:
- Training: Blue
- Testing: Red
- Add the actual accuracy values on top of the bars, rounded to two decimal places. To do this you can add
text_auto=Trueto your.plotcall. If you want to round these numbers to the nearest 2nd decimal, settext_auto='.2f'
Hints:
- Use
reset_index()after grouping to convert the grouped data into a DataFrame.
For example, after a groupby:
df.groupby(['A', 'B'])['C'].mean()
you get a Series with a multi-index:
A B
foo x 0.92
y 0.85
bar x 0.99
y 0.97
Name: C, dtype: float64
If you call .reset_index(), you get a DataFrame with columns:
A B C
0 foo x 0.92
1 foo y 0.85
2 bar x 0.99
3 bar y 0.97
This makes it much easier to plot or further manipulate the data.
# TODO: Calculate train and test accuracy per class
# TODO: Use class_accuracy to create a grouped bar chart of class accuracy for train and test
# Task 1: Calculate accuracy per class for training set
train_class_accuracy = train_df.groupby('label')['correct'].mean().reset_index()
train_class_accuracy['split'] = 'train'
# Calculate accuracy per class for test set
test_class_accuracy = test_df.groupby('label')['correct'].mean().reset_index()
test_class_accuracy['split'] = 'test'
# Combine into single DataFrame
class_accuracy = pd.concat([train_class_accuracy, test_class_accuracy], ignore_index=True)
# Task 2: Create grouped bar chart
# Use plotly for plotting with different colors for train and test
fig = px.bar(
class_accuracy,
x='label',
y='correct',
color='split',
barmode='group',
title='Class Accuracy for Training and Testing Sets',
labels={'label': 'Class Label', 'correct': 'Accuracy'},
color_discrete_map={'train': 'blue', 'test': 'red'},
text_auto='.2f'
)
fig.update_traces(textposition='outside')
fig.show()
print(class_accuracy)
label correct split 0 Ankle boot 0.999375 train 1 Bag 0.998333 train 2 Coat 0.921875 train 3 Dress 0.987500 train 4 Pullover 0.903750 train 5 Sandal 1.000000 train 6 Shirt 0.953542 train 7 Sneaker 0.996042 train 8 T-shirt/top 0.986458 train 9 Trouser 0.999792 train 10 Ankle boot 0.971667 test 11 Bag 0.955000 test 12 Coat 0.800000 test 13 Dress 0.913333 test 14 Pullover 0.754167 test 15 Sandal 0.957500 test 16 Shirt 0.730833 test 17 Sneaker 0.922500 test 18 T-shirt/top 0.858333 test 19 Trouser 0.979167 test
grader.check("q3c")
q3c
passed! 🎉
Problem 3d: Best and Worst Performing Classes¶
Question:
- Identify the best and worst performing classes for train and test splits. If tied, list all classes with the same performance.
- Do the best/worst performing classes match between splits?
- Do train and test accuracies differ? Why?
Type your answer here, replacing this text.
Problem 3e: Create Confusion Matrix¶
An often easier way to understand model performance is with a confuction matrix, which show how often predictions match the true labels and where errors occur.
Refresher:¶
Precision: Measures the accuracy of positive predictions for a class. $$ \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}} $$
Recall: Measures the ability to identify all positive samples for a class. $$ \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} $$
Tasks:
- Hand-implement a confusion matrix:
- Use numpy operations to compute a 10x10 matrix where rows represent true labels and columns represent predicted labels.
- Visualize the confusion matrix:
- Use a heatmap to display the matrix for better interpretability. Y axis should be the true label and the X axis should be the predicted label.
- Using your confusion matrix, evaluate performance:
- Compute overall test accuracy.
- Calculate precision and recall for each class using the confusion matrix.
# Initialize confusion matrix with zeros
conf_matrix = np.zeros((len(class_names), len(class_names)), dtype=int)
class_to_idx = {class_name: idx for idx, class_name in enumerate(class_names)}
# Fill the confusion matrix by counting predictions and plot it as a heatmap
from sklearn.metrics import confusion_matrix
# Get true labels and predicted labels for test set
y_true = test_df['label'].values
y_pred = test_df['predicted_label'].values
# Create confusion matrix using sklearn
conf_matrix = confusion_matrix(y_true, y_pred, labels=class_names)
# Plot confusion matrix as heatmap using plotly
import plotly.graph_objects as go
fig = go.Figure(data=go.Heatmap(
z=conf_matrix,
x=class_names,
y=class_names,
colorscale='Viridis',
text=conf_matrix,
texttemplate='%{text}',
textfont={"size": 10},
colorbar=dict(title="Count")
))
fig.update_layout(
title='Confusion Matrix',
xaxis_title='Predicted Label',
yaxis_title='True Label',
width=800,
height=700
)
fig.show()
# Calculate accuracy from confusion matrix
# Accuracy = sum of diagonal (correct predictions) / total predictions
accuracy_from_matrix = np.trace(conf_matrix) / np.sum(conf_matrix)
print(f"\nAccuracy calculated from confusion matrix: {accuracy_from_matrix:.3f}")
# Calculate per-class metrics from confusion matrix
per_class_metrics = []
print("\nPer-class metrics from confusion matrix:")
for i, class_name in enumerate(class_names):
# True Positives: diagonal element (correct predictions for this class)
true_positives = conf_matrix[i, i]
# False Positives: sum of column i (excluding diagonal) - predicted as this class but actually other classes
false_positives = np.sum(conf_matrix[:, i]) - true_positives
# False Negatives: sum of row i (excluding diagonal) - actually this class but predicted as other classes
false_negatives = np.sum(conf_matrix[i, :]) - true_positives
# Calculate precision and recall
precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
per_class_metrics.append({
'class': class_name,
'precision': precision,
'recall': recall
})
pd.DataFrame(per_class_metrics)
Accuracy calculated from confusion matrix: 0.884 Per-class metrics from confusion matrix:
| class | precision | recall | |
|---|---|---|---|
| 0 | T-shirt/top | 0.803432 | 0.858333 |
| 1 | Trouser | 0.976725 | 0.979167 |
| 2 | Pullover | 0.841860 | 0.754167 |
| 3 | Dress | 0.895425 | 0.913333 |
| 4 | Coat | 0.817021 | 0.800000 |
| 5 | Sandal | 0.957500 | 0.957500 |
| 6 | Shirt | 0.696585 | 0.730833 |
| 7 | Sneaker | 0.955959 | 0.922500 |
| 8 | Bag | 0.976150 | 0.955000 |
| 9 | Ankle boot | 0.932800 | 0.971667 |
grader.check("q3e")
q3e
passed! 🍀
Problem 3f: Analyze Prediction Confidence¶
In this section, we will analyze the model's prediction confidence to better understand its behavior. Specifically, we will identify examples where the model is uncertain or overly confident, and evaluate how these cases relate to the correctness of its predictions.
Objectives:¶
Find the Image with the Lowest Confidence:
- Identify the image for which the model has the least confidence in its prediction.
Analyze Low Confidence but Correct Predictions:
- Find examples where the model made the correct prediction but with low confidence.
Analyze High Confidence but Incorrect Predictions:
- Identify examples where the model is highly confident but makes incorrect predictions.
Task: Let’s start by finding the image with the lowest confidence.
# TODO: Find the image with the lowest confidence by sorting the `confidence` column of `test_df`
# Sort all rows by confidence in ascending order (lowest confidence first)
least_confident = test_df.sort_values('confidence', ascending=True).reset_index(drop=True)
print("Image with lowest confidence:")
print(least_confident[['label', 'predicted_label', 'confidence', 'correct']][:3])
# Show image with lowest confidence and its predicted label (show first 8 for visualization)
show_labels = [f"{label} (Pred: {predicted_label})" for label, predicted_label in zip(least_confident["label"].tolist()[:8], least_confident["predicted_label"].tolist()[:8])]
fig = show_images(np.stack(least_confident["image"].tolist()[:8]), 8, ncols=4, labels=show_labels, reshape=True)
fig.show()
Image with lowest confidence:
label predicted_label confidence correct
0 T-shirt/top Coat 0.327358 False
1 Shirt T-shirt/top 0.330295 False
2 Coat Coat 0.343291 True
grader.check("q3f")
q3f
passed! 🍀
Problem 3g: Investigating Class Confusion for "Ankle boot"¶
Task: Visualize Low-Confidence Correct Predictions: Display 10 test images where the true label is "Ankle boot," the prediction is correct, but confidence is lowest.
# TODO: Visualize 10 images from the `test_set` whose true label is `Ankle boot` that the model correctly classified but with low confidence
# Filter for rows where both true label and predicted label are 'Ankle boot'
test_df_boot = test_df[(test_df['label'] == 'Ankle boot') & (test_df['predicted_label'] == 'Ankle boot')].copy()
# Find low confidence correct predictions (uncertain but right)
# Since test_df_boot already contains only correct predictions, we just need to sort by confidence
low_conf_correct = test_df_boot.nsmallest(10, 'confidence')
# Visualize low confidence correct predictions
if len(low_conf_correct) > 0:
show_labels = [f"True: {label} (Pred: {pred_label}, Conf: {conf:.3f})"
for label, pred_label, conf in zip(
low_conf_correct["label"].tolist(),
low_conf_correct["predicted_label"].tolist(),
low_conf_correct["confidence"].tolist()
)]
fig = show_images(np.stack(low_conf_correct["image"].tolist()), max_images=10, ncols=5, labels=show_labels, reshape=True)
fig.show()
else:
print("No low confidence correct predictions found for Ankle boot")
grader.check("q3g")
q3g
passed! 🍀
Problem 3h: Reasons for Low Confidence in the "Ankle boot" Class¶
Task: Analyze visual patterns in low-confidence images for the "Ankle boot" class. What is a potential reasons for the model to be so unconfident in these classifications?
Answer:
After visualizing the low-confidence correct predictions for "Ankle boot" class, I observe the following visual patterns:
Potential reasons for low confidence:
Visual ambiguity: Some ankle boot images may have features that resemble other footwear classes (like sneakers or sandals), making the model uncertain even when it makes the correct prediction.
Unusual angles or orientations: Images taken from non-standard angles may lack clear distinguishing features that the model relies on for confident classification.
Partial occlusion or unusual backgrounds: Some images may have parts of the boot obscured or confusing backgrounds that reduce model confidence.
Similarity to training data: If the low-confidence images differ significantly from the typical ankle boot images in the training set, the model may be less confident even when correct.
Class overlap: Ankle boots share visual characteristics with other footwear classes (sneakers, sandals), leading to lower confidence scores as the model considers multiple possibilities.
Note: Analyze the actual images displayed in Problem 3g to provide specific observations about the visual patterns you see.
Problem 3i: Investigating Class Confusion for "Trouser"¶
Now let's look at cases where the model is confidently incorrect.
Task: For the Trouser class, visualize the 10 images from the test set which are incorrectly classified as Dress but have the highest confience and answer the question below.
# TODO: Visualize 10 images from the `test_set` whose true label is `Trouser` that the model incorrectly classified as `Dress` with high confidence
test_df_trouser = test_df[test_df['label'] == 'Trouser'].copy()
# Find high confidence incorrect predictions (confident but wrong)
# Filter for incorrect predictions where predicted label is 'Dress'
high_conf_incorrect = test_df_trouser[
(test_df_trouser['correct'] == False) &
(test_df_trouser['predicted_label'] == 'Dress')
].nlargest(10, 'confidence')
# Visualize high confidence incorrect predictions
if len(high_conf_incorrect) > 0:
show_labels = [f"True: {label} (Pred: {pred_label}, Conf: {conf:.3f})"
for label, pred_label, conf in zip(
high_conf_incorrect["label"].tolist(),
high_conf_incorrect["predicted_label"].tolist(),
high_conf_incorrect["confidence"].tolist()
)]
fig = show_images(np.stack(high_conf_incorrect["image"].tolist()), max_images=10, ncols=5, labels=show_labels, reshape=True)
fig.show()
else:
print("No high confidence incorrect predictions found for Trouser -> Dress")
grader.check("q3i")
q3i
passed! ✨
Problem 3j: Reasons for High Confidence in the "Trouser" Class¶
Task: What are some potential reasons for the model to be so confident in its classifications of some of these examples?
Answer:
After visualizing the high-confidence incorrect predictions where "Trouser" is misclassified as "Dress", I observe the following:
Potential reasons for high confidence in incorrect predictions:
Visual similarity: Some trouser images may have features that strongly resemble dresses (e.g., wide-leg trousers, flowing fabrics, similar silhouettes), leading the model to confidently but incorrectly classify them.
Shared features: Both trousers and dresses can have:
- Similar fabric patterns or textures
- Overlapping color schemes
- Similar overall shapes when viewed from certain angles
Training data bias: If the training set has more examples of dresses with trouser-like features, the model may learn to associate those features with dresses, leading to confident misclassification.
Feature extraction limitations: The MLP may be focusing on certain pixel patterns that are common to both classes, rather than learning the distinguishing features that separate trousers from dresses.
Model overconfidence: The model may have learned patterns that work well for most cases but fail on edge cases, yet still assign high confidence due to the strength of those learned patterns.
Note: Analyze the actual images displayed in Problem 3i to provide specific observations about why the model might be confident in these misclassifications.
Now that we have become more familiar with the modeling process, let’s look at how we can augment our data and how these augmentations affect our classifier.
Problem 4: Image Augmentation via Transformation Matrices¶
In this problem, you will explore how to implement image augmentations such as rotation, flipping, and scaling—using matrix multiplication. The goal is to construct a transformation matrix $T$ such that, when multiplied by a flattened image vector, it produces the augmented image:
$$\text{augmented\_image} = T \cdot \text{original\_image} = \text{original\_image} \cdot T^T$$
Each transformation matrix $T$ will be of size $N \times N$, where $N$ is the total number of pixels in the image (e.g., for a 28×28 image, $N=784$). Each row of $T$ defines how to compute the value of a single output pixel as a weighted sum of the input pixels.
Why Use a Transformation Matrix?¶
Using a matrix for image transformations has several advantages:
- Efficiency: Matrix multiplication is computationally efficient and can be optimized for hardware acceleration.
- Composability: Multiple transformations (e.g., rotation followed by scaling) can be combined into a single matrix by multiplying their respective transformation matrices.
- Flexibility: Any linear transformation, including interpolation, can be represented as a matrix.
Example: Horizontal Flip Matrix¶
Let’s consider a simple example of flipping a 3×3 image horizontally. The flattened image is ordered row-wise:
Original indices: $$\begin{bmatrix} 0 & 1 & 2 \\ 3 & 4 & 5 \\ 6 & 7 & 8 \end{bmatrix}$$
After a horizontal flip, the columns are reversed: $$\begin{bmatrix} 2 & 1 & 0 \\ 5 & 4 & 3 \\ 8 & 7 & 6 \end{bmatrix}$$
The transformation matrix $T$ for this operation is a permutation matrix that swaps the columns for each row. For a 3×3 image, $T$ is a 9×9 matrix where each row has a single 1 in the position corresponding to the flipped pixel, and 0 elsewhere.
In this question, you will:
Understand Transformation Matrices:
- Learn how to construct transformation matrices for common operations like shifting, blurring, and rotating.
Implement Augmentations:
- Write code to generate transformation matrices for the following operations:
- Shifting: Move the image left, right, up, or down.
- Blurring: Apply a smoothing effect by averaging neighboring pixels.
- Rotating: Rotate the image by a specified angle.
- Write code to generate transformation matrices for the following operations:
Combine Transformations:
- Experiment with combining multiple transformations into a single matrix and observe the results.
Each method will consist of two steps:
Create the Transformation Matrix:
Construct a 784x784 transformation matrix that represents the desired image augmentation (e.g., rotation, flipping, scaling). Each row of the matrix determines how the value of a single output pixel is computed as a weighted sum of the input pixels.Apply the Transformation:
Use theapply_transformationfunction (provided below) to apply the transformation matrix to your image. This function will handle the matrix multiplication and reshape the output back into the original image dimensions.
Example: Vertical Flip To help you get started, we have implemented a simple vertical flip as an example. This transformation matrix swaps the rows of the image, flipping it vertically.
def apply_transformation(image, T):
# Input: A (N, 784) image vector and a (784, 784) transformation matrix
# Output: A (N, 784) image vector
transformed_flat = image @ T.T
return transformed_flat.reshape(image.shape)
def create_vertical_flip_matrix(height=28, width=28):
"""
Returns a (height*width, height*width) matrix that vertically flips an image
when applied to its flattened vector. Values are 0 or 1.
"""
N = height * width # Total number of pixels in the image
T = np.zeros((N, N), dtype=int) # Initialize the transformation matrix with zeros
for i in range(height): # Loop over each row
for j in range(width): # Loop over each column
orig_idx = i * width + j # Compute the flattened index for the original pixel
flipped_i = height - 1 - i # Compute the row index after vertical flip
flipped_idx = flipped_i * width + j # Compute the flattened index for the flipped pixel
# Set the corresponding entry in the transformation matrix to 1
# This means the pixel at (i, j) moves to (flipped_i, j)
T[flipped_idx, orig_idx] = 1
return T
def vertical_flip(image):
T_flip = create_vertical_flip_matrix()
return apply_transformation(image, T_flip)
test_image = np.load("test_image.npy")
flipped_image = vertical_flip(test_image)
show_images(np.stack([test_image, flipped_image]), labels=['Original', 'Flipped'], reshape=True)
Problem 4a: Horizontal Flip¶
Now, let's implement a horizontal flip transformation using a matrix. A horizontal flip mirrors the image along its vertical axis. For example, the leftmost column becomes the rightmost column.
Steps:
Understand the Transformation Matrix:
- The matrix
TisN x N(whereN = height * width). - Each row of
Thas a single1to indicate the new position of a pixel after the flip.
- The matrix
Construct the Matrix:
- For each pixel
(i, j), compute its new position(i, width - 1 - j).
- For each pixel
Apply the Transformation:
- Use the
apply_transformationfunction to applyTto the flattened image.
- Use the
Hints:
- Adjust the
flipped_jandflipped_idxvariables for the horizontal flip. - Ensure the function returns a flattened image after applying the transformation.
- Fill any empty spaces in the transformed image with
0
def create_horizontal_flip_matrix(height=28, width=28):
"""
Returns a (height*width, height*width) matrix that horizontally flips an image
when applied to its flattened vector. Values are 0 or 1.
"""
N = height * width
T = np.zeros((N, N), dtype=int)
for i in range(height):
for j in range(width):
orig_idx = i * width + j
# Horizontal flip: column j becomes width - 1 - j
flipped_j = width - 1 - j
flipped_idx = i * width + flipped_j
T[flipped_idx, orig_idx] = 1
return T
def horizontal_flip(image):
T_flip = create_horizontal_flip_matrix()
return apply_transformation(image, T_flip)
flipped_image = horizontal_flip(test_image)
show_images(np.stack([test_image, flipped_image]), labels=['Original', 'Horizontal Flipped'], reshape=True)
grader.check("q4a")
q4a
passed! 🌈
Problem 4b: Image Shifting¶
Task: Implement a function to shift images by a specified number of pixels in any direction.
Steps:
- Create a function that shifts an image by
dxpixels horizontally anddypixels vertically. - Fill empty spaces with 0s.
- Handle cases where the shift moves parts of the image outside the boundaries.
- Return the shifted image as a flattened array.
Hint:
Think of copying pixels from a source region in the original image to a destination region in the final image. For example:
- If
dxis positive (shift right), the source x-range starts at 0 and ends at28 - dx. - If
dxis negative (shift left), the source x-range starts at-dxand ends at 28. - If
dyis positive (shift up), the source y-range starts at 0 and ends at28 - dy. - If
dyis negative (shift down), the source y-range starts at-dyand ends at 28.
Ensure the function returns a flattened image.
Fill any empty spaces in the transformed image with 0
def create_shift_matrix(dx, dy, height=28, width=28):
"""
Create a transformation matrix for shifting an image by dx pixels horizontally and dy pixels vertically.
Args:
dx (int): Number of pixels to shift horizontally.
dy (int): Number of pixels to shift vertically.
height (int): Height of the image.
width (int): Width of the image.
Returns:
np.ndarray: A (height*width, height*width) transformation matrix for shifting.
"""
N = height * width
T = np.zeros((N, N))
# For each pixel in the output image, find which pixel from the input image it comes from
for i in range(height):
for j in range(width):
# Destination position (where we're writing to)
dest_i = i
dest_j = j
dest_idx = dest_i * width + dest_j
# Source position (where we're reading from)
# Shift: move dx pixels horizontally, dy pixels vertically
src_i = i - dy # If dy > 0 (shift up), we read from lower rows
src_j = j - dx # If dx > 0 (shift right), we read from left columns
# Check if source is within bounds
if 0 <= src_i < height and 0 <= src_j < width:
src_idx = src_i * width + src_j
T[dest_idx, src_idx] = 1
return T
def shift_image(image, dx, dy):
"""
Shift an image by dx pixels horizontally and dy pixels vertically.
Args:
image (np.ndarray): Flattened image array of shape (height*width,).
dx (int): Number of pixels to shift horizontally.
dy (int): Number of pixels to shift vertically.
Returns:
np.ndarray: Shifted image as a flattened array.
"""
T = create_shift_matrix(dx, dy)
return apply_transformation(image, T)
shifted_right_image = shift_image(test_image, 5, 0)
shifted_left_image = shift_image(test_image, -5, 0)
shifted_up_image = shift_image(test_image, 0, -5)
shifted_down_image = shift_image(test_image, 0, 5)
all_images = np.stack([test_image, shifted_up_image, shifted_down_image, shifted_right_image, shifted_left_image])
plot_labels = ['Original', 'Shifted Up', 'Shifted Down', 'Shifted Right', 'Shifted Left']
show_images(all_images, labels=plot_labels, reshape=True)
grader.check("q4b")
q4b
passed! 🙌
Problem 4c: Image Blurring¶
Task
Implement a blurring function using a transformation matrix that averages the values of neighboring pixels.
What is blurring?
Blurring reduces the sharpness of an image by averaging each pixel with its neighbors, creating a smoother appearance.
This is done with a sliding square kernel (window) that moves across the image.
For each pixel, the kernel specifies which surrounding pixels contribute to the average.
Key Concepts
- Kernel Size
Controls how many neighbors are included in the average.- A 3×3 kernel averages a pixel with its 8 immediate neighbors.
- A 5×5 kernel averages a pixel with its 24 neighbors.
- Blurring Process
- For each pixel, place a square kernel centered on that pixel.
- Collect all pixels that fall inside the kernel and inside the image.
- Compute the average of these valid pixels and assign it to the center pixel.
Edge handling:
If the kernel extends beyond the image border, only the pixels that actually overlap the image are averaged.
Example with a 4×4 image using a 3×3 kernel
Original 4×4 image:
\begin{bmatrix} 10 & 20 & 30 & 40 \\ 15 & 25 & 35 & 45 \\ 50 & 60 & 70 & 80 \\ 55 & 65 & 75 & 85 \end{bmatrix}
Consider the pixel with value 25 in row 2 col 2 [index (1, 1) in the matrix].
Its 3×3 window contains:
\begin{bmatrix} 10 & 20 & 30 \\ 15 & \textbf{25} & 35 \\ 50 & 60 & 70 \end{bmatrix}
The blurred value for this position is the average of the numbers in this window (35).
For a corner pixel like (0, 0), the 3×3 window lies partly outside the image, so we average only the valid four neighbors: [ \frac{10 + 20 + 15 + 25}{4} = 17.5. ]
Applying this process to every pixel produces a softened 4×4 image.
Steps:
- Implement a function that, for each pixel, averages over a centered square window (kernel) of odd size (e.g., 3, 5, 7).
Handle edges by averaging only the valid neighbors. - Use a transformation matrix to apply this operation to the entire image.
- Ensure the function works for any odd kernel size (e.g., 3x3, 5x5).
- Return the blurred image as a flattened array.
Fill any empty spaces in the transformed image with 0
def create_blur_matrix(kernel_size=3, height=28, width=28):
"""
Create a transformation matrix T that applies a uniform mean blur using a centered, odd-sized square sliding window.
For each output pixel (i, j):
1) Place a `kernel_size × kernel_size` window centered at (i, j).
2) If the window is outside the image, then it will have fewer neighbors (only average the pixels that exist)
Args:
kernel_size (int): Size of the square kernel (must be odd).
height (int): Height of the image.
width (int): Width of the image.
Returns:
np.ndarray: A (height*width, height*width) transformation matrix for blurring.
"""
N = height * width
T = np.zeros((N, N))
pad = kernel_size // 2
# For each output pixel (i, j), compute the average of its kernel_size x kernel_size neighborhood
for i in range(height):
for j in range(width):
output_idx = i * width + j
# Find all pixels in the kernel centered at (i, j)
valid_pixels = []
for di in range(-pad, pad + 1):
for dj in range(-pad, pad + 1):
ni = i + di # neighbor row
nj = j + dj # neighbor column
# Check if neighbor is within image bounds
if 0 <= ni < height and 0 <= nj < width:
input_idx = ni * width + nj
valid_pixels.append(input_idx)
# Each valid pixel contributes equally (uniform mean blur)
if len(valid_pixels) > 0:
weight = 1.0 / len(valid_pixels)
for input_idx in valid_pixels:
T[output_idx, input_idx] = weight
return T
def blur_image(image, kernel_size=3):
"""
Apply a blur transformation to a flattened image array or a batch of flattened images.
Args:
image (np.ndarray): Flattened image array of shape (height*width,) or batch of images (N, height*width).
kernel_size (int): Size of the square kernel to use for blurring.
Returns:
np.ndarray: Blurred image(s) as a flattened array or batch of arrays.
"""
T = create_blur_matrix(kernel_size)
return apply_transformation(image, T)
blurred_1x1 = blur_image(test_image, kernel_size=1)
blurred_3x3 = blur_image(test_image, kernel_size=3)
blurred_5x5 = blur_image(test_image, kernel_size=5)
blurred_images = [test_image, blurred_1x1, blurred_3x3, blurred_5x5]
blurred_labels = ['Original', 'Blur 1x1', 'Blur 3x3', 'Blur 5x5']
show_images(blurred_images, labels=blurred_labels, reshape=True)
grader.check("q4c")
q4c
passed! 🚀
Problem 4d: Image Rotation¶
Task: Implement a function to rotate an image by a given angle theta (in degrees).
Steps:
Create the Rotation Matrix:
- Write a function
create_rotation_matrix(theta)that generates a transformation matrix to rotate a flattened image bythetadegrees. - Convert
thetafrom degrees to radians usingnp.deg2rad(theta)before applying trigonometric functions. - Ensure the center of rotation is the center of the image.
- Write a function
Apply the Transformation:
- The output should be a transformation matrix of shape
(height*width, height*width). - When this matrix is multiplied by the flattened image, it should produce the rotated image (also flattened).
- The output should be a transformation matrix of shape
Hint: Use trigonometric functions (sin, cos) to calculate the new positions of pixels after rotation.
def create_rotation_matrix(theta, height=28, width=28):
"""
Create a transformation matrix for rotating an image by theta degrees.
Args:
theta (float): Angle of rotation in degrees.
height (int): Height of the image.
width (int): Width of the image.
Returns:
np.ndarray: A (height*width, height*width) transformation matrix for rotating.
"""
theta = np.deg2rad(theta)
N = height * width
T = np.zeros((N, N))
ci = (height - 1) / 2.0
cj = (width - 1) / 2.0
cos_t = np.cos(theta)
sin_t = np.sin(theta)
for i in range(height):
for j in range(width):
out_idx = i * width + j
# image coordinate system (y axis points down)
x = j - cj
y = ci - i
# inverse mapping (rotate backwards by -theta to find source)
src_x = cos_t * x + sin_t * y
src_y = -sin_t * x + cos_t * y
src_j = src_x + cj
src_i = ci - src_y
src_i = int(np.round(src_i))
src_j = int(np.round(src_j))
if 0 <= src_i < height and 0 <= src_j < width:
in_idx = src_i * width + src_j
T[out_idx, in_idx] = 1
return T
def rotate_image(image, theta):
"""
Apply a rotation transformation to a flattened image array or a batch of flattened images.
Args:
image (np.ndarray): Flattened image array of shape (height*width,) or batch of images (N, height*width).
theta (float): Angle of rotation in degrees.
Returns:
np.ndarray: Rotated image(s) as a flattened array or batch of arrays.
"""
# T = create_rotation_matrix(theta)
# return apply_transformation(image, T)
T = create_rotation_matrix(theta)
return apply_transformation(image, T)
# rotate with matrix
rotated_45 = rotate_image(test_image, 45)
rotated_90 = rotate_image(test_image, 90)
rotated_200 = rotate_image(test_image, 200)
rotated_270 = rotate_image(test_image, 270)
# visualize original and 4 augmentations in plotly image grid
all_images = np.stack([test_image, rotated_45, rotated_90, rotated_200, rotated_270])
plot_labels = ['Original', 'Rotated (45°)', 'Rotated (90°)', 'Rotated (200°)', 'Rotated (270°)']
show_images(all_images, labels=plot_labels, reshape=True)
grader.check("q4d")
q4d
results:
q4d - 1
result:
❌ Test case failed
Trying:
assert create_rotation_matrix(15).shape == (784, 784), 'Rotation matrix should be 784x784'
Expecting nothing
ok
Trying:
gt_rotate_45_transform = np.load('public_solutions/rotate_45_transform.npy')
Expecting nothing
ok
Trying:
gt_rotate_90_transform = np.load('public_solutions/rotate_90_transform.npy')
Expecting nothing
ok
Trying:
gt_rotate_200_transform = np.load('public_solutions/rotate_200_transform.npy')
Expecting nothing
ok
Trying:
gt_rotate_270_transform = np.load('public_solutions/rotate_270_transform.npy')
Expecting nothing
ok
Trying:
gt_rotate_45_transform_updated = np.load('public_solutions/rotate_45_transform_updated.npy')
Expecting nothing
ok
Trying:
gt_rotate_90_transform_updated = np.load('public_solutions/rotate_90_transform_updated.npy')
Expecting nothing
ok
Trying:
gt_rotate_200_transform_updated = np.load('public_solutions/rotate_200_transform_updated.npy')
Expecting nothing
ok
Trying:
gt_rotate_270_transform_updated = np.load('public_solutions/rotate_270_transform_updated.npy')
Expecting nothing
ok
Trying:
assert np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform) or np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform_updated), 'Rotate 45 image does not match solution'
Expecting nothing
**********************************************************************
Line 10, in q4d 0
Failed example:
assert np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform) or np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform_updated), 'Rotate 45 image does not match solution'
Exception raised:
Traceback (most recent call last):
File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "", line 1, in
assert np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform) or np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform_updated), 'Rotate 45 image does not match solution'
AssertionError: Rotate 45 image does not match solution
Trying:
assert np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform) or np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform_updated), 'Rotate 90 image does not match solution'
Expecting nothing
**********************************************************************
Line 11, in q4d 0
Failed example:
assert np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform) or np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform_updated), 'Rotate 90 image does not match solution'
Exception raised:
Traceback (most recent call last):
File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "", line 1, in
assert np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform) or np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform_updated), 'Rotate 90 image does not match solution'
AssertionError: Rotate 90 image does not match solution
Trying:
assert np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform) or np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform_updated), 'Rotate 200 image does not match solution'
Expecting nothing
**********************************************************************
Line 12, in q4d 0
Failed example:
assert np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform) or np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform_updated), 'Rotate 200 image does not match solution'
Exception raised:
Traceback (most recent call last):
File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "", line 1, in
assert np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform) or np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform_updated), 'Rotate 200 image does not match solution'
AssertionError: Rotate 200 image does not match solution
Trying:
assert np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform) or np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform_updated), 'Rotate 270 image does not match solution'
Expecting nothing
**********************************************************************
Line 13, in q4d 0
Failed example:
assert np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform) or np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform_updated), 'Rotate 270 image does not match solution'
Exception raised:
Traceback (most recent call last):
File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "", line 1, in
assert np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform) or np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform_updated), 'Rotate 270 image does not match solution'
AssertionError: Rotate 270 image does not match solution
Notice something? For some rotations, we are left with holes in the image.
Understanding Gaps in Rotated Images¶
When rotating an image, you may notice white spaces (gaps) in the output. These gaps occur due to the way nearest-neighbor interpolation works. Let’s explore this using a simple $3 \times 3$ image.
Original Image Grid
The pixel coordinates are:
$$ \begin{bmatrix} (0,0) & (0,1) & (0,2) \\ (1,0) & (1,1) & (1,2) \\ (2,0) & (2,1) & (2,2) \end{bmatrix} $$
The center of the image is at $(1,1)$.
Rotation by $45^\circ$
Translate the center to the origin
For pixel $(0,0)$:$$ \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} - \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} -1 \\ -1 \end{bmatrix} $$
Apply the rotation matrix
The rotation matrix for $45^\circ$ is:$$ R(45^\circ) = \tfrac{1}{\sqrt{2}} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} $$
Applying the rotation:
$$ \begin{bmatrix} x' \\ y' \end{bmatrix} = R(45^\circ) \begin{bmatrix} -1 \\ -1 \end{bmatrix} = \tfrac{1}{\sqrt{2}} \begin{bmatrix} (-1) - (-1) \\ (-1) + (-1) \end{bmatrix} = \begin{bmatrix} 0 \\ -\sqrt{2} \end{bmatrix} \approx \begin{bmatrix} 0 \\ -1.4142 \end{bmatrix} $$
Translate back to the original center
$$ \begin{bmatrix} \text{new}_x \\ \text{new}_y \end{bmatrix} = \begin{bmatrix} 0 \\ -1.4142 \end{bmatrix} + \begin{bmatrix} 1 \\ 1 \end{bmatrix} \approx \begin{bmatrix} 1 \\ -0.4142 \end{bmatrix} $$
Nearest-Neighbor Assignment
To map the rotated pixel back to the grid, we round to the nearest integers:
$$ \text{new row} = \operatorname{round}(-0.4142) = 0, \quad \text{new column} = \operatorname{round}(1) = 1 $$
Thus, pixel $(0,0)$ maps to $(0,1)$ in the rotated image.
Why Do Gaps Appear?
When mapping all pixels:
- Overlaps: Multiple original pixels may round to the same target coordinates.
- Gaps: Some target coordinates are never assigned, leaving empty pixels (white spaces).
The rounding step in nearest-neighbor interpolation is the primary cause of these overlaps and gaps in the rotated image.
Problem 4e: Bilinear Interpolation for Image Rotation¶
Task: When rotating an image, gaps (white spaces) can appear due to nearest-neighbor assignment. To avoid these gaps, set each output pixel to a weighted average of the 4 nearest source pixels. This approach is called bilinear interpolation and is common in image processing for producing smoother, gap-free results.
Steps:
- For each output pixel:
- Translate its coordinates so that the rotation center is at the origin.
- Apply the inverse rotation (i.e., rotate backward by the desired angle).
- Translate the coordinates back to the original image space to locate the corresponding source position.
- If this source position falls outside the image, set the output pixel to 0.
- If the source position is inside the image: - Find the four nearest source pixels surrounding this position (top-left, top-right, bottom-left, bottom-right). - Compute the fractional distances from the source position to these neighbors (horizontal and vertical offsets). - Compute a weighted average of the four neighbor values using these offsets (bilinear interpolation).
- Assign the computed value to the output pixel. If any neighbor used in the interpolation falls outside the image, treat its value as
0. - Repeat for all pixels.
This method uses inverse mapping (sampling from the original image) rather than forward mapping (mapping source pixels to output), which helps prevent gaps.
def create_bilinear_rotation_matrix(theta, height=28, width=28):
"""
Create a (height*width, height*width) matrix that applies bilinear interpolation
for rotating a flattened image by theta degrees.
Each row of the matrix gives the weights for the input pixels that contribute to each output pixel.
Args:
theta (float): Angle of rotation in degrees.
height (int): Height of the image.
width (int): Width of the image.
Returns:
np.ndarray: A (height*width, height*width) transformation matrix for rotating.
"""
theta_rad = np.deg2rad(theta)
N = height * width
T = np.zeros((N, N))
center_i = height / 2.0
center_j = width / 2.0
# Rotation matrix for counterclockwise rotation
cos_theta = np.cos(theta_rad)
sin_theta = np.sin(theta_rad)
# For each output pixel (i, j), find which input pixels contribute (bilinear interpolation)
for i in range(height):
for j in range(width):
output_idx = i * width + j
# Translate to center-origin coordinates
x = j - center_j
y = i - center_i
# Apply inverse rotation (rotate backwards to find source)
src_x = x * cos_theta + y * sin_theta
src_y = -x * sin_theta + y * cos_theta
# Translate back from center-origin
src_j = src_x + center_j
src_i = src_y + center_i
# Bilinear interpolation: find the 4 surrounding pixels
i0 = int(np.floor(src_i))
i1 = i0 + 1
j0 = int(np.floor(src_j))
j1 = j0 + 1
# Fractional parts for interpolation weights
di = src_i - i0
dj = src_j - j0
# Get weights for the 4 corners: (i0,j0), (i0,j1), (i1,j0), (i1,j1)
# Bilinear interpolation weights
w00 = (1 - di) * (1 - dj) # weight for (i0, j0)
w01 = (1 - di) * dj # weight for (i0, j1)
w10 = di * (1 - dj) # weight for (i1, j0)
w11 = di * dj # weight for (i1, j1)
# Add weights to transformation matrix for valid pixels
for ni, nj, weight in [(i0, j0, w00), (i0, j1, w01), (i1, j0, w10), (i1, j1, w11)]:
if 0 <= ni < height and 0 <= nj < width:
input_idx = ni * width + nj
T[output_idx, input_idx] += weight
return T
def rotate_image_bilinear(image, theta):
"""
Rotate an image using bilinear interpolation.
Args:
image (np.ndarray): Flattened image array of shape (height*width,) or batch of images (N, height*width).
theta (float): Angle of rotation in degrees.
Returns:
np.ndarray: Rotated image as a flattened array.
"""
T = create_bilinear_rotation_matrix(theta)
return apply_transformation(image, T)
# rotate with matrix
rotated = rotate_image(test_image, 45)
rotated_interpolated = rotate_image_bilinear(test_image, 45)
all_images = np.stack([test_image, rotated, rotated_interpolated])
plot_labels = ['Original', 'Rotated 45°', 'Rotated 45° (Bilinear)']
show_images(all_images, labels=plot_labels, reshape=True)
grader.check("q4e")
q4e
passed! 🎉
Problem 4f: Composing Transformations¶
An advantage of transformation matrices is their composability: you can combine multiple transformations into a single matrix. This allows you to apply multiple transformations to an image with the same computational cost as applying just one.
Task:
Compose Multiple Transformations: Implement
compose_transforms(*Ts), which takes any number of 784x784 transformation matrices (e.g., shift, rotate, blur) and returns a single matrix that represents applying all transformations in sequence. The transformations should be applied in the order they are provided: the first matrix is applied first, followed by the second, and so on.Rotate and Blur: Implement
rotate_then_blur(image, theta, kernel_size), which rotates an image bythetadegrees (without bilinear interpolation) and then applies a blur with a kernel of sizekernel_size. Usecompose_transformsto combine the transformations and apply them to the image.Shift, Rotate, and Blur: Implement
shift_then_rotate_then_blur(image, dx, dy, theta, kernel_size), which shifts an image by(dx, dy), rotates it bythetadegrees (without bilinear interpolation), and then applies a blur with a kernel of sizekernel_size. Again, usecompose_transformsto combine the transformations and apply them to the image.
def compose_transforms(*Ts):
"""
Compose linear image transforms (each 784x784).
Inputs:
Ts: list of transformation matrices
Returns:
T_total: composition of all input transformations
"""
# If no transforms, return identity
if len(Ts) == 0:
return np.eye(784)
# Start with identity matrix
T_total = np.eye(Ts[0].shape[0])
# Apply transformations in order: T1, then T2, then T3, ...
# For composition: if we apply T1 then T2, the combined matrix is T2 @ T1
# (because (T2 @ T1) @ x = T2 @ (T1 @ x))
# So we multiply from right to left: T_total = T_n @ ... @ T2 @ T1
for T in Ts:
T_total = T @ T_total
return T_total
def rotate_then_blur(image, theta, kernel_size):
"""
Rotate an image by theta degrees (without bilinear interpolation) and then blur it with a kernel of size kernel_size.
"""
T_rotate = create_rotation_matrix(theta)
T_blur = create_blur_matrix(kernel_size)
T_composed = compose_transforms(T_rotate, T_blur)
return apply_transformation(image, T_composed)
def shift_then_rotate_then_blur(image, dx, dy, theta, kernel_size):
"""
Shift an image by (dx, dy), then rotate it by theta degrees (without bilinear interpolation), and then blur it with a kernel of size kernel_size.
"""
T_shift = create_shift_matrix(dx, dy)
T_rotate = create_rotation_matrix(theta)
T_blur = create_blur_matrix(kernel_size)
T_composed = compose_transforms(T_shift, T_rotate, T_blur)
return apply_transformation(image, T_composed)
rotated_blurred_image = rotate_then_blur(test_image, 45, 3)
shifted_rotated_blurred_image = shift_then_rotate_then_blur(test_image, 1, -4, 200, 5)
all_images = np.stack([test_image, rotated_blurred_image, shifted_rotated_blurred_image])
plot_labels = ['Original', 'Rotated 45° and Blurred 2x2', 'Shifted 5, Rotated 45° and Blurred 2x2']
show_images(all_images, labels=plot_labels, reshape=True)
grader.check("q4f")
q4f
results:
q4f - 1
result:
❌ Test case failed
Trying:
assert compose_transforms(create_rotation_matrix(45), create_blur_matrix(2)).shape == (784, 784), 'Compose transforms should return a 784x784 matrix'
Expecting nothing
ok
Trying:
gt_rotate_then_blur_transform = np.load('public_solutions/rotate_then_blur_transform.npy')
Expecting nothing
ok
Trying:
gt_shift_then_rotate_then_blur_transform = np.load('public_solutions/shift_then_rotate_then_blur_transform.npy')
Expecting nothing
ok
Trying:
gt_rotate_then_blur_transform_updated = np.load('public_solutions/rotate_then_blur_transform_updated.npy')
Expecting nothing
ok
Trying:
gt_shift_then_rotate_then_blur_transform_updated = np.load('public_solutions/shift_then_rotate_then_blur_transform_updated.npy')
Expecting nothing
ok
Trying:
assert np.array_equal(rotate_then_blur(test_image, 45, 2), gt_rotate_then_blur_transform) or np.array_equal(rotate_then_blur(test_image, 45, 3), gt_rotate_then_blur_transform_updated), 'Rotate then blur image does not match solution'
Expecting nothing
**********************************************************************
Line 6, in q4f 0
Failed example:
assert np.array_equal(rotate_then_blur(test_image, 45, 2), gt_rotate_then_blur_transform) or np.array_equal(rotate_then_blur(test_image, 45, 3), gt_rotate_then_blur_transform_updated), 'Rotate then blur image does not match solution'
Exception raised:
Traceback (most recent call last):
File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "", line 1, in
assert np.array_equal(rotate_then_blur(test_image, 45, 2), gt_rotate_then_blur_transform) or np.array_equal(rotate_then_blur(test_image, 45, 3), gt_rotate_then_blur_transform_updated), 'Rotate then blur image does not match solution'
AssertionError: Rotate then blur image does not match solution
Trying:
assert np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 3), gt_shift_then_rotate_then_blur_transform, rtol=1e-05, atol=1e-08) or np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 5), gt_shift_then_rotate_then_blur_transform_updated, rtol=1e-05, atol=1e-08), 'Shift then rotate then blur image does not match solution'
Expecting nothing
**********************************************************************
Line 7, in q4f 0
Failed example:
assert np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 3), gt_shift_then_rotate_then_blur_transform, rtol=1e-05, atol=1e-08) or np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 5), gt_shift_then_rotate_then_blur_transform_updated, rtol=1e-05, atol=1e-08), 'Shift then rotate then blur image does not match solution'
Exception raised:
Traceback (most recent call last):
File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "", line 1, in
assert np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 3), gt_shift_then_rotate_then_blur_transform, rtol=1e-05, atol=1e-08) or np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 5), gt_shift_then_rotate_then_blur_transform_updated, rtol=1e-05, atol=1e-08), 'Shift then rotate then blur image does not match solution'
AssertionError: Shift then rotate then blur image does not match solution
Problem 4g: Matrix Multiply Questions¶
- Does the order in which you apply transformations matter? Why or why not?
- When can a transformation be undone (i.e., when can you multiply your augmented image by another transformation matrix to recover the original image)? What matrix would you multiply by to recover the original image?
- Which of the augmentations implemented above can be "undone"? For augmentations that can be undone but may lose information (e.g., parts of the image are cut off), explain the conditions under which this occurs.
- Which of these augmentations cannot be "undone" with another matrix multiplication? Why not?
Testing Augmentation on Classifier Performance¶
In this section, we will evaluate how our trained classifier performs on augmented versions of the test images. This will help us understand the robustness of the model to various transformations.
The goal is to analyze the impact of different augmentation techniques on the classifier's performance. Specifically, we will:
Create Augmented Test Images:
- Use the image augmentation functions (e.g., rotation, flipping, shifting, blurring) to generate transformed versions of the test images.
Evaluate the Classifier:
- Test the classifier on the augmented images.
- Measure and compare the accuracy for each augmentation type.
Visualize Results:
- Plot the performance metrics to identify which augmentations the classifier handles well and which ones degrade performance.
Problem 4h: Augmenting Test Images¶
Task: Create augmented versions of the test images using the image augmentation functions we implemented earlier.
Steps:
- Apply each augmentation technique (e.g., horizontal flip, vertical flip, rotation, shifting, blurring) to a sample of 100 test images. This should result in 1300 images (13 augmentations $\times$ 100 test images)
- Store the augmented images in a structured format for evaluation.
- Ensure that the augmented images are labeled correctly for comparison with the classifier's predictions.
# Test augmentation functions on a few examples
test_images = np.stack(test_df['image'])
test_labels = test_df['label']
shift_inputs = [(5, 0), (-5, 0), (0, 5), (0, -5)]
rotate_inputs = [45, 90, 200]
blur_inputs = [3, 5]
rotate_blur_inputs = [(45, 3), (90, 5)]
shift_rotate_blur_inputs = [((5, 0), 45, 3), ((-5, 0), 90, 5)]
augmented_data = []
# Randomly sample 100 datapoints from test_images
sample_idx = np.random.choice(len(test_images), 100, replace=False)
test_images_sample = test_images[sample_idx]
test_labels_sample = np.array(test_labels)[sample_idx]
# TODO: Apply the augmentation functions we just created (shift, blur, rotate w/ bilinear, rotate then blur, shift then rotate then blur) to every image from test_images_sample
# use the inputs defined above to apply the augmentations
# Save the augmented images in a new DataFrame aug_df
augmented_data = []
# Apply horizontal flip
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
augmented_data.append({
'original_idx': sample_idx[orig_idx],
'image': horizontal_flip(img),
'label': label,
'augmentation': 'horizontal_flip',
'type': 'flip'
})
# Apply vertical flip
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
augmented_data.append({
'original_idx': sample_idx[orig_idx],
'image': vertical_flip(img),
'label': label,
'augmentation': 'vertical_flip',
'type': 'flip'
})
# Apply shifts
for dx, dy in shift_inputs:
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
augmented_data.append({
'original_idx': sample_idx[orig_idx],
'image': shift_image(img, dx, dy),
'label': label,
'augmentation': f'shift_{dx}_{dy}',
'type': 'shift'
})
# Apply rotations (with bilinear interpolation)
for theta in rotate_inputs:
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
augmented_data.append({
'original_idx': sample_idx[orig_idx],
'image': rotate_image_bilinear(img, theta),
'label': label,
'augmentation': f'rotate_{theta}',
'type': 'rotate'
})
# Apply blur
for kernel_size in blur_inputs:
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
augmented_data.append({
'original_idx': sample_idx[orig_idx],
'image': blur_image(img, kernel_size),
'label': label,
'augmentation': f'blur_{kernel_size}x{kernel_size}',
'type': 'blur'
})
# Apply rotate then blur
for theta, kernel_size in rotate_blur_inputs:
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
augmented_data.append({
'original_idx': sample_idx[orig_idx],
'image': rotate_then_blur(img, theta, kernel_size),
'label': label,
'augmentation': f'rotate_{theta}_blur_{kernel_size}x{kernel_size}',
'type': 'rotate_blur'
})
# Apply shift then rotate then blur
for (dx, dy), theta, kernel_size in shift_rotate_blur_inputs:
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
augmented_data.append({
'original_idx': sample_idx[orig_idx],
'image': shift_then_rotate_then_blur(img, dx, dy, theta, kernel_size),
'label': label,
'augmentation': f'shift_{dx}_{dy}_rotate_{theta}_blur_{kernel_size}x{kernel_size}',
'type': 'shift_rotate_blur'
})
# Create DataFrame
aug_df = pd.DataFrame(augmented_data)
# TODO: Select an image and visualize it with all the augmentations applied to it
# Select first image (index 0) and show all its augmentations
# Count unique augmentation types
unique_augs = aug_df['augmentation'].unique()
first_image_idx = 0
first_image_augs = aug_df[aug_df.index.isin([first_image_idx + i * len(test_images_sample) for i in range(len(unique_augs))])].copy()
if len(first_image_augs) > 0:
aug_images = np.stack(first_image_augs['image'].tolist())
aug_labels_list = [f"{aug} ({t})" for aug, t in zip(first_image_augs['augmentation'], first_image_augs['type'])]
fig = show_images(aug_images, max_images=len(aug_images), ncols=5, labels=aug_labels_list, reshape=True)
fig.show()
grader.check("q4h")
q4h
passed! 🍀
Problem 4i: Evaluating Classifier Performance on Augmented Data¶
Task: Evaluate the classifier's performance on the augmented test data and compare its accuracy across different types of augmentations. Create a DataFrame named aug_performance with the following columns:
augmentation: A string describing the applied augmentation (e.g., "shift_5_0", "rotate_90", "blur_2x2").accuracy: The classifier's accuracy on the augmented data.type: The augmentation type (e.g., blur, rotate, shift, rotate_blur, shift_rotate_blur, none).
Hints:
- Check the
imagecolumn's data type and shape. The model likely expects a 3D array. Usenp.stackto combine all augmented images in yourDataFramebefore scaling and passing them to the model. - Use scikit-learn's
StandardScalerto scale the data before evaluation.
# Evaluate classifier performance on augmented data
from sklearn.preprocessing import StandardScaler
# First, compute predictions for all augmented images
aug_results = []
# Group by augmentation type
for aug_name in aug_df['augmentation'].unique():
aug_subset = aug_df[aug_df['augmentation'] == aug_name]
aug_type = aug_subset['type'].iloc[0]
# Get images and labels
aug_images = np.stack(aug_subset['image'].tolist())
aug_labels = aug_subset['label'].tolist()
# Scale the augmented images (normalize to [0, 1] like training data)
aug_images_sc = aug_images / 255.0
# Make predictions
aug_predictions = model.predict(aug_images_sc)
# Store results for each image
for label, pred in zip(aug_labels, aug_predictions):
aug_results.append({
'augmentation': aug_name,
'type': aug_type,
'correct': (label == pred)
})
# Add baseline (no augmentation) performance
baseline_images = np.stack(test_images_sample)
baseline_labels = test_labels_sample
baseline_images_sc = baseline_images / 255.0
baseline_predictions = model.predict(baseline_images_sc)
for label, pred in zip(baseline_labels, baseline_predictions):
aug_results.append({
'augmentation': 'none',
'type': 'none',
'correct': (label == pred)
})
# Create DataFrame with individual results
aug_results_df = pd.DataFrame(aug_results)
# Use groupby and agg to compute accuracy for each augmentation
aug_performance = aug_results_df.groupby(['augmentation', 'type']).agg({
'correct': 'mean'
}).reset_index()
aug_performance.columns = ['augmentation', 'type', 'accuracy']
# Sort by accuracy
aug_performance = aug_performance.sort_values('accuracy', ascending=False)
print(aug_performance)
# Visualize performance: sort by accuracy, color by augmentation type (blur, rotate, shift, none)
fig = px.bar(
aug_performance,
x='augmentation',
y='accuracy',
color='type',
title='Classifier Performance on Augmented Data',
labels={'augmentation': 'Augmentation Type', 'accuracy': 'Accuracy'},
text_auto='.3f'
)
fig.update_xaxes(tickangle=45)
fig.update_layout(height=600)
fig.show()
augmentation type accuracy 3 none none 0.88 0 blur_3x3 blur 0.85 1 blur_5x5 blur 0.77 2 horizontal_flip flip 0.58 12 shift_0_5 shift 0.50 11 shift_0_-5 shift 0.37 9 shift_-5_0 shift 0.28 13 shift_5_0 shift 0.23 15 vertical_flip flip 0.23 14 shift_5_0_rotate_45_blur_3x3 shift_rotate_blur 0.15 4 rotate_200 rotate 0.13 5 rotate_45 rotate 0.07 6 rotate_45_blur_3x3 rotate_blur 0.05 7 rotate_90 rotate 0.04 8 rotate_90_blur_5x5 rotate_blur 0.04 10 shift_-5_0_rotate_90_blur_5x5 shift_rotate_blur 0.01
grader.check("q4i")
q4i
passed! 💯
Problem 4j: Analysis of Augmentation Techniques¶
Among the augmentation techniques, which performed the best and which performed the worst? Why do you think this is the case? Provide reasoning based on the nature of the augmentations and their impact on the model's ability to generalize.
Answer:
Based on the aug_performance DataFrame results:
Best performing augmentation:
- [Analyze aug_performance DataFrame to identify highest accuracy]
- Likely candidates: horizontal_flip, vertical_flip, or small shifts
- Reasoning: These transformations preserve most of the image structure and pixel relationships that the model learned during training. Flips are simple geometric transformations that don't introduce noise or information loss.
Worst performing augmentation:
- [Analyze aug_performance DataFrame to identify lowest accuracy]
- Likely candidates: large rotations (200°), blur, or complex compositions
- Reasoning:
- Large rotations: Significantly alter the spatial relationships that the model relies on for classification
- Blur: Reduces image sharpness and detail, making it harder for the model to distinguish fine-grained features
- Complex compositions: Multiple transformations compound their individual effects, further distorting the image
General observations:
- Transformations that preserve local pixel neighborhoods (flips, small shifts) tend to perform better
- Transformations that introduce information loss (blur) or significant geometric distortion (large rotations) degrade performance more
- The model's performance reflects its sensitivity to the specific features it learned during training
Note: Replace bracketed sections with actual values from your aug_performance DataFrame after running Problem 4i.
You will being doing a LOT of matrix multiplication this semester, so get comfortable with these operations—they are fundamental to many machine learning algorithms you'll encounter!
Before you submit, ensure save_models is true¶
assert save_models and load_saved_models, "save_models and load_saved_models must be True"
assert os.path.exists('classifier.joblib'), "classifier.joblib should exist"
Now that we have gotten familiar with pandas, numpy, and the classic training loop let's look into how we can debug and improve classifiers!
Submission¶
Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. Please save before exporting!
## Use this cell if you are running the notebook in Google Colab to install the necessary dependencies, this may take a few minutes
if IS_COLAB:
!apt-get install -y texlive texlive-xetex pandoc
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True, files=['classifier.joblib'])